86 research outputs found

    A binaural grouping model for predicting speech intelligibility in multitalker environments

    Get PDF
    Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.R01 DC000100 - NIDCD NIH HH

    Cortical transformation of spatial processing for solving the cocktail party problem: a computational model(1,2,3).

    Get PDF
    In multisource, "cocktail party" sound environments, human and animal auditory systems can use spatial cues to effectively separate and follow one source of sound over competing sources. While mechanisms to extract spatial cues such as interaural time differences (ITDs) are well understood in precortical areas, how such information is reused and transformed in higher cortical regions to represent segregated sound sources is not clear. We present a computational model describing a hypothesized neural network that spans spatial cue detection areas and the cortex. This network is based on recent physiological findings that cortical neurons selectively encode target stimuli in the presence of competing maskers based on source locations (Maddox et al., 2012). We demonstrate that key features of cortical responses can be generated by the model network, which exploits spatial interactions between inputs via lateral inhibition, enabling the spatial separation of target and interfering sources while allowing monitoring of a broader acoustic space when there is no competition. We present the model network along with testable experimental paradigms as a starting point for understanding the transformation and organization of spatial information from midbrain to cortex. This network is then extended to suggest engineering solutions that may be useful for hearing-assistive devices in solving the cocktail party problem.R01 DC000100 - NIDCD NIH HHSPublished versio

    A physiologically inspired model for solving the cocktail party problem.

    Get PDF
    At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). It has recently been observed that individual neurons in the avian field L (analog to the mammalian auditory cortex) can display broad spatial tuning to single targets and selective tuning to a target embedded in spatially distributed sound mixtures. Here, we describe a model inspired by these experimental observations and apply it to process mixtures of human speech sentences. This processing is realized in the neural spiking domain. It converts binaural acoustic inputs into cortical spike trains using a multi-stage model composed of a cochlear filter-bank, a midbrain spatial-localization network, and a cortical network. The output spike trains of the cortical network are then converted back into an acoustic waveform, using a stimulus reconstruction technique. The intelligibility of the reconstructed output is quantified using an objective measure of speech intelligibility. We apply the algorithm to single and multi-talker speech to demonstrate that the physiologically inspired algorithm is able to achieve intelligible reconstruction of an "attended" target sentence embedded in two other non-attended masker sentences. The algorithm is also robust to masker level and displays performance trends comparable to humans. The ideas from this work may help improve the performance of hearing assistive devices (e.g., hearing aids and cochlear implants), speech-recognition technology, and computational algorithms for processing natural scenes cluttered with spatially distributed acoustic objects.R01 DC000100 - NIDCD NIH HHSPublished versio

    A Modeling Study of the Responses of the Lateral Superior Olive to Ipsilateral Sinusoidally Amplitude-Modulated Tones

    Get PDF
    The lateral superior olive (LSO) is a brainstem nucleus that is classically understood to encode binaural information in high-frequency sounds. Previous studies have shown that LSO cells are sensitive to envelope interaural time difference in sinusoidally amplitude-modulated (SAM) tones (Joris and Yin, J Neurophysiol 73:1043–1062, 1995; Joris, J Neurophysiol 76:2137–2156, 1996) and that a subpopulation of LSO neurons exhibit low-threshold potassium currents mediated by Kv1 channels (Barnes-Davies et al., Eur J Neurosci 19:325–333, 2004). It has also been shown that in many LSO cells the average response rate to ipsilateral SAM tones decreases with modulation frequency above a few hundred Hertz (Joris and Yin, J Neurophysiol 79:253–269, 1998). This low-pass feature is not directly inherited from the inputs to the LSO since the response rate of these input neurons changes little with increasing modulation frequency. In the current study, an LSO cell model is developed to investigate mechanisms consistent with the responses described above, notably the emergent rate decrease with increasing frequency. The mechanisms explored included the effects of after-hyperpolarization (AHP) channels, the dynamics of low-threshold potassium channels (KLT), and the effects of background inhibition. In the model, AHP channels alone were not sufficient to induce the observed rate decrease at high modulation frequencies. The model also suggests that the background inhibition alone, possibly from the medial nucleus of the trapezoid body, can account for the small rate decrease seen in some LSO neurons, but could not explain the large rate decrease seen in other LSO neurons at high modulation frequencies. In contrast, both the small and large rate decreases were replicated when KLT channels were included in the LSO neuron model. These results support the conclusion that KLT channels may play a major role in the large rate decreases seen in some units and that background inhibition may be a contributing factor, a factor that could be adequate for small decreases

    Communications Biophysics

    Get PDF
    Contains reports on five research projects.National Institutes of Health (Grant 1 P01 GM-14940-01)Joint Services Electronics Program under Contract DA 28-043-AMC-02536(E

    Intermural correlation sensitivity

    Get PDF
    Abstract: Sensitivity to differences in interauraF correlation was measured as a function of reference intermural correlation and frequency (250 to 15W Hz) for narrowband-noise stimuli (1.3 ERBs wide) and for the same stimuli spectrally fringed by broadband correlated noise. d' was measured for twe-interval discriminations betweerr fixed pairs of correlation values, and these measurements were used to generate cumulative d' versus correlation curves for each stimulus frequency and type. The perceptual cue reported by subjects was perceived intracranial breadth for narrowbarrd stimuli (wider image for lower correlation) and loudness of a whistling sound heard at the frequency of the decorrelated band for the fringed stimuli (louder for lower correlation). At low correlations, sensitivity was greater for fringed than for narrowband stimuli at all frequencies, but at higher correlations, sensitivity was often greater for narrowband stimuli. For fringed stimuli, cumulative sensitivity was greater at low frequencies than at high frequencies, but listeners produced varied patterns for narrowband stimuli. The forms of cumulative d' curves as a function of frequency were interpolated using an eight-parameter fitted function. Such functions may be used to predict listeners' perceptions of stimuli that vary across frequency in intermuralcorrelation

    Interaural correlation sensitivity

    Get PDF
    Sensitivity to differences in interaural correlation was measured for 1.3-ERB-wide bands of noise using a 2IFC task at six frequencies: 250, 500, 750, 1000, 1250, and 1500 Hz. The sensitivity index, d′, was measured for discriminations between a number of fixed pairs of correlation values. Cumulative d′ functions were derived for each frequency and condition. The d′ for discriminating any two values of correlation may be recovered from the cumulative d′ function by the difference between cumulative d′’s for these values. Two conditions were employed: the noisebands were either presented in isolation (narrow-band condition) or in the context of broad, contiguous flanking bands of correlated noise (fringed condition). The cumulative d′ functions showed greater sensitivity to differences in correlation close to 1 than close to 0 at low frequencies, but this difference was less pronounced in the fringed condition. Also, a more linear relationship was observed when cumulative d′ was plotted as a function of the equivalent signal-to-noise ratio (SNR) in dB for each correlation value, rather than directly against correlation. The equivalent SNR was the SNR at which the interaural correlation in an NoSπ stimulus would equal the interaural correlation of the noise used in the experiment. The maximum cumulative d′ declined above 750 Hz. This decline was steeper for the fringed than for the narrow-band condition. For the narrow-band condition, the total cumulative d′ was variable across listeners. All cumulative d′ functions were closely fitted using a simple two-parameter function. The complete data sets, averaged across listeners, from the fringed and narrow-band conditions were fitted using functions to describe the changes in these parameters over frequency, in order to produce an interpolated family of curves that describe sensitivity at frequencies between those tested. These curves predict the spectra recovered by the binaural system when complex sounds, such as speech, are masked by noise

    Communications Biophysics

    Get PDF
    Contains reports on three research projects.National Institutes of Health (Grant 5 P01 GM14940-04

    Sensory Communication

    Get PDF
    Contains list of research project split into seven sections, listing researchers and grants.National Science Foundation (Grant BNS 84-11392)National Institutes of Health (Grant 5 RO1 NS10916)National Institutes of Health (Grant 5 RO1 NS12846)National Institutes of Health (Grant 5 RO1 NS14902)National Science Foundation (Grant BNS 84-17817)National Institutes of Health (Grant 1 RO1 NS21322)National Institutes of Health (Grant 1 P01 NS23734)National Science Foundation (Grant DMC 83-32460

    Auditory Psychophysics and Aids for the Deaf

    Get PDF
    Contains table of contents for Section 2 and list of eight research projects, including sponsors and principal investigators for each.National Science Foundation (Grant BNS 84-11392)National Institutes of Health (Grant 5 R01 NS10916)National Institutes of Health (Grant 5 R01 NS12846)National Institutes of Health (Grant 5 R01 NS14902)National Science Foundation (Grant BNS 84-17817)National Institutes of Health (Grant 2 R01 NS21322)National Institutes of Health (Grant P01 NS23734)National Science Foundation (Grant DMC 83-32460
    • …
    corecore